December 9, 2019
Problem Definition
Research Purpose
Feeder Schools
Feeder Schools Detail
Not-for-profit to address underserved NYC DOE students in the reputable Specialized High Schools process.
How can PASSNYC address the Diversification of NYC Specialized High Schools using Data Science?
Models
Features
Recommendation 1:
Increase the number of SHSAT takers by focusing on schools with good average academics and few takers
SHSAT participation is highly correlated to the academic performance. 33 schools with good or average academics in underperforming districts were identified using Linear Regression and add approximately 20 candidates per school yielding at least 660 SHSAT candidates.
Recommendation 2:
Increase SHSAT pass rates by focusing on schools with many “Level4 students” and few offers
There are 472 schools that received 0-5 offers per school. Regression models show high correlation between SHSAT success and Level4 scores on Common Core exams. These 33 schools are estimated to yield 408 additional SHSAT candidates in predominatly Black/Hispanic schools which would move the needle on the demographic balance somewhat.
Recommendation 3:
Focus on bright students in underperforming schools with low SHSAT participation and low academics.
The common denominator seems that these schools all have really low performing academics and a culture that avoids the SHSAT. At least 35 such schools had at least one Level4 result. Focusing resources on these schools would better prepare students with critical reading and math skills should yield additional Specialized High School seats.
Features
Elbow method based on within-cluster sum of squares determines that the optimal number of cluster is 3. Using 20 features listed above, k-means algorithm partitions 472 middle schools with non-demographic data available into 3 clusters with different levels of academic performance as shown in the following scatterplot.
KNN-Model
Recommendation 1:
On-campus intervention at 5 schools in Cluster A representing middle schools most likely to have students qualified for SPHS.
Recommendation 2:
Awareness campaign at 48 schools in Cluster B to boost awareness about SHSAT and SPHS at 48 middle schools in Cluster B:
Recommendation 3:
Regional information sessions and workshops at 3 locations for all schools Middle schools in top 25% of the Underrepresentation Score cluster around three locations: Harlem, Bronx and Brooklyn (Braodway Junction), neighborhoods have a high proportion of Black and Hispanic residents.
Awareness: Organize regional information sessions in Harlem, Bronx and Brooklyn (Broadway Junction) to direct parents and students to resources available. Organize regional test preparation workshops in these neighborhoods.
Recommendation:
Why decision tree workflow? To explain the results business people and partner organizations
Whenever one has to explain a difficult statistical model to stakeholders, one can use decision trees. Decision trees basically unfolds the complete process of decision making and series of decision to reach to the conclusion. Here we will have to focus on “yes” class, as it means yes that particular school is underperforming. This is one of the simplest methods and most convinient methods when it comes to explain the statistical model to partners (or even to layman population)
The PASSNYC competition accentuates the underserved schools in the city by geographic, demographic, economically and over time. The “Recommendor” systems results show schools grouped into 3 categories roughly equating to 1) Students with high academic scores that don’t take the SHSAT 2) Students with average academic scores where focussed intruction could yield more students eligible for the Specialized High Schools. 3) Students with below average academic scores where the objective is academic intervention to bring students up to standard.
Outreach to students in the first group which are academically prepared is a second order problem requiring awareness and test preparation. The second and third groups require more in-depth academic intervention and preparation during grade school to bring the majority of students up to academic standards and provide opportunities for the top students among these underserved groups to also prepare for the SHSAT.
To the extent that such interventions occur, the demographic balance of the Specialized High Schools will become more aligned with the city’s demographics and these schools will retain their merit-based selectivity.
https://www.kaggle.com/passnyc/data-science-for-good/discussion/63311 https://alexromero.shinyapps.io/PASSNYC https://towardsdatascience.com/data-science-takes-on-public-education-f432910ea9f0 https://patch.com/new-york/new-york-city/just-45-middle-schools-gave-specialized-high-schools-60-percent-their https://www.nytimes.com/interactive/2018/06/29/nyregion/nyc-high-schools-middle-schools-shsat-students.html http://jonathansoma.com/lede/algorithms-2017/classes/networks/networkx-graphs-from-source-target-dataframe/ https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python